Comprehensive Machine Learning Analysis of Student Academic Performance
Dataset: 9000 students | 10 original features | 7 Grade categories
9000
Total Students
13
Models Trained
100.0%
Best Accuracy
Decision Tree
🏆 Best Model
📋 1. Dataset Overview
The dataset contains 9000 student records from Martin Luther School
with scores in Math, Physics, and Chemistry (range: 10–100).
Grade Distribution
Grade
Count
Percentage
Description
A+
49
0.5%
Excellent Performance
A
360
4.0%
Very Good Achievement
B+
1014
11.3%
Good Pursuance
B
1797
20.0%
Average Performance
C
2187
24.3%
Below Average Achievement
D
2887
32.1%
Poor Pursuance
F
706
7.8%
Failed
Key Insight: The dataset is imbalanced — Grade D is the most common (32.1%),
while A+ is extremely rare (0.5%). This class imbalance affects model performance,
especially for minority classes.
📊 2. Exploratory Data Analysis
Interpretation: The grade distribution is roughly bell-shaped but skewed toward
lower grades. D is the most frequent grade, suggesting many students struggle across subjects.
The rare A+ class (49 students) will be hardest for models to predict.
Interpretation: All three subjects show approximately uniform distributions
across the 10–100 range, with means around 53–56. No subject appears inherently harder or easier.
Interpretation: Math, Physics, and Chemistry scores show near-zero correlation
with each other (~0.00), meaning student performance in one subject is independent of others.
This is an interesting finding — performing well in Math doesn't predict Physics or Chemistry scores.
🤖 3. Classification Model Results (Grade Prediction)
We trained 13 classification models to predict student grades
from their Math, Physics, and Chemistry scores plus engineered features.
#
Model
Accuracy
Precision
Recall
F1 Score
1
Decision Tree
1.0000
1.0000
1.0000
1.0000
2
Random Forest
1.0000
1.0000
1.0000
1.0000
3
Gradient Boosting
1.0000
1.0000
1.0000
1.0000
4
Gradient Boosting (Tuned)
1.0000
1.0000
1.0000
1.0000
5
Voting Ensemble
1.0000
1.0000
1.0000
1.0000
6
Stacking Ensemble
1.0000
1.0000
1.0000
1.0000
7
Random Forest (Tuned)
0.9994
0.9995
0.9994
0.9994
8
SVM (linear)
0.9917
0.9918
0.9917
0.9916
9
MLP Neural Network
0.9872
0.9873
0.9872
0.9872
10
SVM (rbf)
0.9739
0.9742
0.9739
0.9738
11
Logistic Regression
0.9711
0.9716
0.9711
0.9702
12
KNN (k=5)
0.9467
0.9469
0.9467
0.9466
13
Naive Bayes
0.9439
0.9458
0.9439
0.9442
🏆 Best Model: Decision Tree
Accuracy: 1.0000 | F1 Score: 1.0000
Tree-based ensemble methods (Random Forest, Gradient Boosting) typically perform best on this
dataset because they can capture the non-linear decision boundaries between grade categories
based on the combination of three independent score features.
🔢 4. Confusion Matrices
Interpretation: Confusion matrices show where models make mistakes.
The diagonal represents correct predictions. Most errors occur between adjacent grades
(e.g., B vs B+, C vs D), which is expected since these grades have overlapping score ranges.
The rare A+ class is often misclassified due to limited training examples.
📈 5. ROC Curves (Pass/Fail Classification)
Interpretation: ROC curves show the tradeoff between true positive rate and
false positive rate for binary Pass/Fail classification. All models achieve high AUC scores,
indicating that distinguishing between passing and failing students is relatively straightforward
based on score features. Models with AUC > 0.90 are considered excellent classifiers.
🎯 6. Feature Importance Analysis
Key Findings:
Total_Score and Average_Score are the most important features, as grades are
primarily determined by the combined performance across all subjects.
Min_Score is also highly important — a very low score in any subject
can significantly lower the overall grade.
Individual subject scores (Math, Physics, Chemistry) contribute roughly equally,
confirming that no single subject dominates grade determination.
Score_Range and Score_Std capture the consistency of performance —
students with high variance across subjects tend to receive different grades than
consistently performing students.
🔄 7. Cross-Validation Results
Interpretation: Cross-validation provides a more reliable estimate of model
performance by training and testing on different data splits. Low standard deviation in CV
scores indicates stable, reliable models. Gradient Boosting and Random Forest typically show
the best balance of high accuracy and low variance.
📚 8. Learning Curves Analysis
Interpretation:
If training and validation curves converge at a high score → model generalizes well
Large gap between training and validation → overfitting (model memorizes training data)
Both curves plateau at a low score → underfitting (model is too simple)
Random Forest may show signs of slight overfitting (high training score, lower validation)
Logistic Regression curves converge quickly, suggesting the model is simpler but stable
🔮 9. Clustering Results (Unsupervised Learning)
Interpretation: K-Means clustering reveals natural groupings in the data
based on score patterns. The PCA visualization shows that grade labels roughly correspond
to clusters in the reduced feature space, but there is significant overlap between adjacent
grades. DBSCAN identifies the core dense regions and outlier students with unusual score combinations.
📈 10. Regression Model Results
Model
R²
MAE
RMSE
Linear Regression
1.0000
0.0000
0.0000
Ridge Regression
1.0000
0.0000
0.0000
Lasso Regression
1.0000
0.0052
0.0064
Random Forest Regressor
0.9980
0.5199
0.6724
Gradient Boosting Regressor
0.9969
0.6497
0.8298
Interpretation: Linear Regression achieves perfect R² = 1.000 for predicting
Total_Score from individual subject scores because Total_Score = Math + Physics + Chemistry
(a perfect linear relationship). For Average_Score prediction, tree-based regressors capture
non-linear patterns slightly better than linear models.
🎯 11. Key Conclusions & Recommendations
Summary of Findings:
Best classification model: Decision Tree with 100.0% accuracy
Subject scores are independent — Math, Physics, Chemistry show ~0 correlation
Grade is determined by total/average score, not by any single subject
Class imbalance affects prediction of rare grades (A+ and F)
Ensemble methods (Voting, Stacking) provide robust predictions